AlexU-Word: A New Dataset for Isolated-Word Closed-Vocabulary Offline Arabic Handwriting Recognition
نویسندگان
چکیده
In this paper, we introduce a new dataset for offline Arabic handwriting recognition. The aim is to collect a large dataset of isolated Arabic words that covers all letters of the alphabet in all possible shapes using a small number of simple words. The end goal is to obtain a very large database of segmented letter images, which can be used to build and evaluate Arabic handwriting recognition systems that are based on segmented letter recognition. The current version of the dataset contains 25114 samples of 109 unique Arabic words that cover all possible shapes of all alphabet letters. The samples were collected from 907 writers. In its current form, the dataset can be used for the problem of closed-vocabulary word recognition. We evaluated a number of window-based descriptors and classifiers on this task and obtained an accuracy of 92.16% using a SIFT-based descriptor and ANN.
منابع مشابه
Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملOHASD: The First On-Line Arabic Sentence Database Handwritten on Tablet PC
In this paper we present the first Arabic sentence dataset for on-line handwriting recognition written on tablet pc. The dataset is natural, simple and clear. Texts are sampled from daily newspapers. To collect naturally written handwriting, forms are dictated to writers. The current version of our dataset includes 154 paragraphs written by 48 writers. It contains more than 3800 words and more ...
متن کاملRejection measures for handwriting sentence recognition
In this paper we study the use of confidence measures for an on-line handwriting recognizer. We investigate various confidence measures and their integration in an isolated word recognition system as well as in a sentence recognition system. In isolated word recognition tasks, the rejection mechanism is designed in order to reject the outputs of the recognizer that are possibly wrong, which is ...
متن کاملRWTH OCR: A Large Vocabulary Optical Character Recognition System for Arabic Scripts
We present a novel large vocabulary OCR system, which implements a 5 confidenceand margin-based discriminative training approach for model adap6 tation of an HMM based recognition system to handle multiple fonts, different 7 handwriting styles, and their variations. Most current HMM approaches are HTK 8 based systems which are maximum-likelihood (ML) trained and which try to adapt 9 their model...
متن کاملOn-Line Handwriting Recognition Using Hidden Markov Models
New global information-bearing features improved the modeling of individual letters, thus diminishing the error rate of an HMM-based on-line cursive handwriting recognition system. This system also demonstrated the ability to recognize on-line cursive handwriting in real time. The BYBLOS continuous speech recognition system, a hidden Markov model (HMM) based recognition system, is applied to on...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1411.4670 شماره
صفحات -
تاریخ انتشار 2014